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ABSTRACT 

i  i, 

Thi®  paper  introduces  an  extended  iterative  weighted  least  squares 
procedure,*  tie nu LeUJlJby’" EIWLS;  for  solution  of  a  classical  problem  in  analysis 
of  scientific  and  technical  information:  estimation  in  the  presence  of  compli¬ 
cations,  of  the  coefficients  for  linearly  independent  component  signals  in  a 
linear  model,  from  observations  on  the  component  signals  and  a  composite 
signal  containing  the  linear  model  plus  noise  which  is  nonstationary  and/or 
correlated  with  unknown  covariance  matrix. (\ An  iterative  weighted  least 
squares  procedure,  denoted  by  IWLS,  is  developed'  fur  estimation  in  the 
absence  of  complications.  Then  IWLS  is  extended  to  perform  the  estimation 
subject  to:  (1)  estimators  being  consistent  with  a  priori  information- describ¬ 
ing  the  random  variation  of  coefficients  over  all  possible  states  of  nature 
(e.  g.,  all  systems  of  a  specified  type  from  a  production  process);  (2)  utiliza¬ 
tion  of  data  from  all  pertinent  channels  in  the  estimation  of  coefficients  which 
appear  in  the  linear  models  for  more  than  one  data  channel;  and  (3)  replace¬ 
ment  of  the  linear  model  by  a  new  linear  model  containing  only  representa¬ 
tive  component  signals  which  are  highly-descriptive,  but  not  highly- related, 
when  there  is  a  large  number  of  component  signals  in  the  linear  model  and 
some  of  them  are  highly-related,  FORTRAN  computer  programs  have  been 
written  to  implement  IWLS  and  EIWLS  on  the  IBM  7094  for  the  case  of 
nonstationary  and  uncorrelated  noise. 


*The  paper  forms  a  portion  of  Chapter  2  in  Computer  Science  and 
Statistics;  Partners  in  Progress,  a  forthcoming  volume  edited  by 
A.  F.  Goodman  and  N.  ft.  Mann.  It  represents  a  current  and  revised 
version  of  the  author's  "Estimation  of  Coefficients  in  a  Linear  Model  by 
Extended  Iterative  Weighted  Least  Squares,  "  Autonetics  Publication 
X4-1290/32,  North  American  Rockwell  Corporation,  August  1964.  Sec¬ 
tion  1  has  been  revised,  Section  7  and  the  References  have  been  expanded 
and  brought  up  to  date,  and  Section  8  has  been  summarized. 
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1.  INTRODUCTION  AND  SUMMARY 


Since  the  early  19th  century,  estimation  of  a  linear  model  from  data 
subject  to  error  has  been  a  classical  problem  in  analysis  of  scientific  and 
technical  information.  The  least  squares  procedure  for  solution  of  the 
problem,  formulated  by  Gauss  in  1802  and  published  by  Legendre  in  1806,  is 
essentially  the  first  statistical  technique  developed  for  analysis  of  informa¬ 
tion  (Ref  41).  However,  the  effectiveness  of  least  squares  and  related  pro¬ 
cedures  mainly  depends  upon  characteristics  of  the  error. 

This  paper  introduces  an  extended  iterative  weighted  least  squares 
procedu-e,  denoted  by  EIWLS,  for  solution  of  the  classical  problem  in  the 
presence  of  complications.  A  complete  description  of  the  problem,  which 
may  be  termed  a  generalized  statistical  regression  problem,  is:  estimation 
in  the  presence  of  complications,  of  the  coefficients  for  linearly  independent 
component  signals  in  a  linear  model,  from  observations  on  the  component 
signals  and  a  composite  signal  containing  the  linear  model  plus  noise  with 
unknown  characteristics.  Component  signals  are  sometimes  called  input  or 
independent  variables,  the  composite  signal  is  sometimes  called  an  output 
or  dependent  variable,  and  the  noise  is  sometimes  called  a  residual  or 
random  error.  Since  EIWLS  was  developed  originally  for  error  analysis  of 
an  inertial  navigation  system,  the  signal-and-noise  terminology  is  employed. 

Pertinent  characteristics  of  the  noise  are  contained  in  the  square 
array  of  noise  variances  and  covariances,  called  the  noise  covariance 
matrix.  Noise  is  said  to  be  stationary  and  uncorrelated  when  that  matrix  is 
a  constant  multiple  of  the  identity  matrix,  ncnstationary  and  uncorrelated 
when  it  is  a  diagonal  matrix,  stationary  and  correlated  when  each  row  of  it 
is  a  proper  arrangement  of  elements  in  the  first  row,  and  nonstationary  and 
correlated  otherwise. 

Consider  estimation  in  the  absence  of  complications.  If  the  noise 
covariance  matrix  is  known,  then  the  optimum  estimators  of  t)  e  coefficients 
are  the  weighted  least  squares  estimators  determined  by  its  inverse.  If  the 
noise  covariance  matrix  is  not  known,  its  estimation  appears  to  be  a  reason¬ 
able  step  toward  estimation  of  the  coefficients. 

Estimation  of  the  linear  model  is,  itself,  required  to  estimate  the 
noise  covariance  matrix.  Goodman  (Ref  1)  presented  an  iterative  weighted 
least  squares  procedure,  denoted  by  IWLS,  to  accomplish  the  estimation 
when  the  noise  is  nonstationary  and/or  correlated  (i.e.,  not  stationary  and 
uncorrelated- -in  whic ii  case,  the  least  squares  estimators  are  optimum) 
with  unknown  covariance  matrix.  Briefly,  IWLS: 

1 .  Obtains  the  least  squares  estimators  of  the  coefficients. 

2.  Calculates  an  estimator  of  the  noise  covariance  matrix  by  using 
the  composite  signal  and  its  least  squares  estimator,  based  upon 
the  least  squares  estimators  of  the  coefficients,  to  estimate  the 
necessary  noise  variances  and  covariances. 

3.  Obtains  the  weighted  least  squares  estimators  of  the  coefficients 
which  are  determined  by  the  inverse  of  this  matrix  estimator. 


1 


4.  Iteratively  repeats  2  and  3,  with  the  least  squares  estimators  of 
the  coefficients  replaced  by  the  latest  set  of  weighted  least  squares 
estimators  of  the  coefficients,  and  obtains  a  new  estimator  of  the 
noise  covariance  matrix  and  a  new  set  of  weighted  least  squares 
estimators  of  the  coefficients. 

5.  Continues  the  iteration  in  4  until  a  preassigned  level  of  stability 
is  attained. 

The  complications  and  their  treatment  were  also  summarized  in  general 
terms  by  Goodman  (Ref  1).  This  paper  is  an  extension  of  Ref  1  and  intro¬ 
duces  an  improved  and  extended  IWLS.  Improvement  of  IWLS,  as  presented 
in  Ref  1,  involves  improved  estimation  of  the  noise  covariance  matrix.  In 
the  following  paragraphs,  each  complication  and  the  corresponding  extension 
of  IWLS  to  EIWLS  is  briefly  discussed. 

Coefficients  in  the  linear  model  are  constant  for  a  particular  state  of 
nature  (e.  g. ,  a  particular  system  of  a  specified  type  from  a  production 
process).  However,  they  may  vary  randomly  from  one  state  of  nature  to 
another.  A  priori  information  describing  the  coefficients'  random  variation, 
over  all  possible  states  of  nature  (e.  g. ,  all  systems  of  that  specified  type 
from  the  production  process),  may  exist  from  previous  analysis;  and  the 
estimators  ought  to  be  consistent  with  it.  To  insure  this,  a  modification  of 
IWLS  permits  the  incorporation  into  the  procedure  of  a  priori  information 
concerning  the  means  and  covariance  matrix  of  the  coefficients. 

Data  may  exist  from  several  data  channels,  and  a  coefficient  may 
appear  in  the  linear  models  for  more  than  one  data  channel.  The  data  from 
all  pertinent  channels  should  be  utilized  in  the  estimation  of  that  coefficient. 
This  may  be  accomplished  by  properly  arranging  the  coefficients  and  data 
from  all  channels  into  a  form  suitable  for  the  application  of  IWLS. 

There  may  be  a  large  number  of  component  signals  in  the  linear  model 
and  some  of  them,  though  linearly  independent,  may  be  highly- related. 
Component  signals  are  called  highly- related  in  this  paper  when  they  possess 
a  high  degree  of  linear  dependence.  For  accuracy  and  ease  of  computation, 
it  is  frequently  desirable  to  replace  the  linear  model  by  a  new  linear  model 
containing  only  representative  component  signals  which  are  highly-descriptive, 
but  not  highly- related,  and  to  estimate  the  coefficients  of  the  representative 
component  signals  in  the  new  linear  model.  To  accomplish  this,  the  set  of 
component  signals  is  partitioned  into  subsets  of  highly- related  ones  and  the 
appropriate  weighted  average  of  a  subset  is  selected  to  be  the  representative 
component  signal  for  that  subset  in  the  new  linear  model.  The  coefficients 
in  the  new  linear  model  may  then  be  estimated  by  IWLS.  In  addition,  an 
estimator  for  the  coefficient  of  the  representative  component  signal  for  a 
subset  is  apportioned  among  the  coefficients  of  component  signals  in  the 
subset,  via  the  weighting  scheme  used  in  the  selection  of  that  representative 
component  signal. 
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Additional  operations,  such  as  pre-editing  of  data  and  inclusion  of 
estimates  based  upon  previous  sets  of  data  into  the  a  priori  information,  may 
be  added  without  too  much  difficulty. 

Although  EIWLS  is  applicable  when  the  noise  is  nonstationary  and/or 
correlated,  it  has  been  programmed  for  the  computer  only  in  the  case  of 
nonstationary  and  uncorrelated  noise.  Two  FORTRAN  computer  programs, 
the  basic  one  (Ref  2)  and  another  (Ref  3)  using  JDfroymson's  technique  (Ref  4) 
to  preselect  the  representative  component  signals,  have  been  written  to 
implement  IWLS  as  presented  in  Ref  1;  and  a  FORTRAN  computer  program 
(Ref  5)  has  been  written  to  implement  EIWLS. 

Iterative  statistical  procedures  such  as  EIWLS  extract  considerably 
more  information  from  the  data  than  do  noniterative,  closed-form  procedures 
such  as  least  squares.  In  view  of  recent  computer  hardware  and  software 
development,  iterative  procedures  have  been  feasible  to  implement  and 
evaluate  for  some  time  and  continue  to  become  more  so  with  the  passage  of 
time.  It  is  therefore  "penny-wise  and  pound-foolish"  not  to  utilize  EIWLS, 
when  dictated  by  theory,  statistical  tests  such  as  the  one  given  in  Ref  42,  or 
examination  of  the  data.  Indeed,  EIWLS  is  even  more  appropriate  today  than 
at  the  time  of  its  development. 

Since  iterative  statistical  procedures  bridge  the  gap  between  the  two 
extremes  of  noniterative,  closed-form  and  optimum  statistical  procedures, 
it  is  somewhat  surprising  that  the  development  of  a  meaningful  theory  for 
iterative  statistics  has  been  essentially  neglected  in  favor  of  the  continued-- 
and  almost  academic- -characterization  and  comparison  of  the  two  extremes. 
This  is  well  illustrated  by  the  survey  of  related  literature  in  Section  7.  J.t 
is  noteworthy  that  the  quite- recent  Ref  42  contains  an  iterative  procedure 
which  is  closely- related  to  IWLS,  as  well  as  an  approach  to  the  construction 
of  confidence  intervals  and  statistical  tests,  when  the  noise  is  nonstationary 
and  uncorrelated. 

Ref  41  recommends  the  consideration  of  four  questions  regarding  an 
iterative  statistical  procedure: 

1.  Under  what  conditions  does  the  iterative  procedure  converge? 

2.  How  rapidly  does  the  iterative  procedure  converge? 

3.  Under  what  conditions  does  the  iterative  procedure  converge  to 
the  proper  solution? 

4.  To  what  extent  does  the  iterative  procedure  improve  upon  a 
noniterative,  closed-form  procedure? 

Partial  answers  to  all  four  questions  are  provided  by  Sections  2-8. 

Those  interested  in  only  the  essence  of  EIWLS  may  confine  themselves 
to  Section  1.  Sections  6,  7  and  8  augment  Section  1  with  a  discussion,  a. 
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comprehensive  survey  of  related  literature,  and  an  illustrative  example.  In 
Sections  2  through  5,  the  statistical  details  of  EIWL.S  (whose  comprehension 
may  require  careful  reading  by  the  nonstatistician)  are  displayed,  with  a 
minimum  of  development  and  amplification,  along  with  some  reasonable 
alternatives  which  are  listed  in  footnotes. 


2.  ITERATIVE  WEIGHTED  LEAST  SQUARES 


Suppose  that  t  represents  time  or  some  other  auxiliary  variable. 
At  time  t:  Xj(t),  X2(t),  ...»  Xp(t)  denote  linearly  independent  com¬ 
ponent  signals;  £],  fiz . Ap  denote  unknown  coefficients;  Y(t) 

denotes  a  composite  signal;  ana  t(t)  denotes  noise  with  mean  zero, 
variance  d2(t)  and  covariance  o( t,t*)  between  c(t)  and  c(t*).  Then  the 

linear  models  is^^0:Xs(t)  and  the  representation  of  Y(t)  as  containing 


J  =  1 


the  linear  model  plus  noise  is 


Y(t)  =^/8jXj(t)  +  «(t). 


Given  the  explicit  observations  Xjj  of  Xj(t)  for  j  =  l,  2,  . . . 
Yi  of  Y(t),  which  imply  the  implicit  observation  «i  =  Yj  - 


,  p  and 


of  «(t),  at  time  tj  for  i  =  1, 2,  ...,n>p,  the  generalized  statistical 
regression  problem  without  the  complicating  restrictions  is  the  esti¬ 


mation  of  •  •  •  *  Pp  when  t  j,  <  . . . ,  «n  have  mean  zero  and 

unknown  n-by-n  covariance  matrix  £  of  variances  <7;-  =  a  ^  -  a^(t-)  and 
covariances  o^i  =  ^(^h' 


Complicated  expressions  in  this  and  subsequent  paragraphs  may 
be  written  in  compact  form  by  the  introduction  of  matrix  notation.  Let 


*  A  constant  term  may  be  included  in  the  linear  model  by  setting 
X^(t)  identically  equal  to  one. 
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alter  ego  of  Equation  (1)  is 
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and  the  matrix  alter  ego  of  the  generalized  statistical  regression 
problem  without  the  complicating  restrictions  is  the  estimation  of 
P  given  X  and  Y,  when  X  is  unknown. 


The  weighted  least  squares  estimators  of  S,.  0, . 0, 

which  are  determined  by  the  n-by-n  matrix 


W  = 


w  -  w  .  .  .  w 
11  12  In 


W 1 2  W22  '  •  ‘  W2n 


w ,  w  .  .  .  w 
In  2n  nn 


of  weights  w^  (or  any  matrix  that  is  a  constant  multiple  of  W)  are 
those  minimizing  tire  quadratic  fo^m 


W 


n 

z 

h  =  1 


Z  "hi  V‘v£*jV 

1-1  J  =  1  J  J  J  ^  1 


-  (Y  -  Xg)'  W(Y  -  X£). 


It  may  be  shown  that  these  estimators  are  given  by 

Kz . Pw/  ‘  (X'WX)-1  X'WY 

and  have  covariance  matrix  (X' W  X)' 1  X' W  I  W  X  (X' W  X)*1.  They 
are  called  the  least  squares  estimators  if  W  is  the  n-by-n  ident.t, 
matrix  ln. 


In  the  sense  of  possessing  minimum  variance  among  all  unbiased 

P 

linear  estimators,  of  provid.ng  an  estimator  of  y  b  X  v.  hioh  has 

f“i  J1 


minimum  variance  among  all  unbiased  linear  estima'ors  of  it  and  of 
being  maximum  likelihood  estimators  when  the  noise  is  normally 

distributed,  the  optimum  estimators  of  Pi-  Pl<  - P  p  are  the 

weighted  least  squares  estimators  which  are  dete  rmined  by  X  “  * 

(Ref  6,  Chapter  14  and  Ref  7,  Sections  1.  3-1.  5).  The  least  squares 
estimators  are,  therefore,  optimum  only  if  the  noise  is  stationary 
and  uncorrelated,  and  X  is  a  constant  multiple  of  the  identity  matrix. 
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For  nonstationary  and/or  correlated  noise,  it  is  not  unreasonable  to 
presume  that  estimation  of  2  and  then  estimation  of  02»  .  .  .  ,  0 
by  weighted  least  squares  is  superic 

P 

by  least  squares.  Estimation  of  ^ 

j  =  1 

however,  required  to  estimate  2.  This  may  be  accomplished  via 
estimation  of  0  j,  02,  0p.  Hence,  iterative  estimation  of 

01.  02.  •  •  •  >  0p  and  then  2,  as  accomplished  by  IWLS,  should  pro¬ 
vide  improvement  over  least  squares  estimation  of  0j,  0£,  .  .  .  ,  0D. 


r  to  estimation  of  0  l ,  02.  •  •  •  >  0p 
0^  X^,  for  i-1,  2,  ,  ,  nis, 


Stated  in  symbols,  IWLS: 

1.  Obtains  the  least  squares  estimators  0j^,  0£^>  •  •  •  >0p^ 
from 


n 

(01 


(0) 


ft 

^2  ' 


(X’  X)*1  X’J  .  (3) 


2. 


*  (1) 

Calculates  an  estimator  2  ,  by  using  and  its  least 

squares  estimator 


£  (0) 


(4) 


for  i  =  1,  2,  ,  .  .  ,  n  in  an  appropriate  estimation  scheme  to 
calculate  the  necessary  and 


btains  the  weighted  least  squares  estimators  BA 
2(1),  ...»  $p'*)  which  are  determined  by  (£  (l/)"*  from 


(1  )y 

P 


-  [x1  (i  (1))_1  xj'1  X’  (2(1))_1  Y  .  (5) 

4.  Iteratively  repeats  2  and  3,  with  $!«».  02<°> . Sp<°> 

and  fo~  i  =  1,  2,  . . .  ,  n  replaced  by  the  latest  set  of 

weighted  least  squares  estimators  p^c\  P^c\  ....  $pC  and 


A 


Y. 

1 


(6) 


8 


s 


■**r 


for  i  -  1,  2,  ....  n,  and  obtains  a  new  estimator  and 


.  new  set  of  weighted  least  sauares  estimators 

l2(c+1)>  ....  L{c  +  l)  ^om 


^(c+»  =  «3.(c+i>,  0  <CT,i . e  tc+IV 

~  id  p 


or  *£»' 


-  jx'  (i  (c  +  1))-1  x]’1  X'  ( 2  <c  +  1))‘  1  Y  .  (7) 

5.  Continues  the  iteration  in  4  for  c  =  1,  2,  ....  c*-l,  where 
c*  either  is  determined  when  a  valid  measure  of  the  change 
in  $  i^c\  •  •  •  .  Pp^  becomes  less  than  a  preassigned 

constant  or  is,  itself,  a  preassigned  constant.  An  estimator 
for  the  covariance  mat’-  ixoiPi'cK  . Pp'C^  is  given 

by  [X’(2,<C>)"1X  j-1  i(c+1)(l(c))-1  xjx'tX^))-^], 

Y^hich  simplifies  to  [  X' (X(c))_  i x]  when  X(c+^  =  X(c),  for 
X  ^  -  In  and  c  =  0,  1 ,  ....  c*. 

The  selection  of  an  appropriate  estimation  scheme  to  use  in  2 
is  influenced  not  only  by  the  type  of  noise  and  the  data,  but  also  by 
statistical  considerations 

When  the  noise  is  nonstationary  and  uncorrelated, 


,  2  2  2, 

id  n 

An  estimator  of  or ^  is  provided  by 

*  (r)2  *  (c-1)  2 

<Y  1  =  (Y.  -  Y.V  ')  for  c  =  1,2 _ _  c*andi  =  1,2,. 


.  >  n.  (8) 


These  estimators  are  somewhat  unsatisfactory  because  each  of  them 
is  based  upon  only  one  observation;  and  they  should  be  combined  to 
produce  more  satisfactory  estimators.  In  most  applications,  it  is 
reasonable  to  assume  a  linear  model  for  the  variation  of  <^(t).  Let 
T  j (t) ,  T 2 ( t) ,  .  .  .  ,  Tp(t)  be  known  functions  of  t  and  Vj  ,  v^,  ,  Vp  be 

unknown  coefficients  with  p  +  P  <  n;  and  assume 


e2(t)  =  £  k  T.  (t). 
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'--'W'J'iMfSHW*1 


P 

A  simple  version  of  £  vkTk<t).  which  is  employed  by  the  computer 

k=l  P 

Zip  ^  1 

v  t  "  .  Suppose  that 
k-i  k 

Tki  =  Tk(ti)  for  i  =  1,  2,  ....  n  and  k  =  1,  2,  ....  P, 
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Then  an  appropriate  estimation  scheme*  to  use  fn  2  is 


t(!)  -  (t1(1>, 


(9) 


a{c)  =  ,*  (c)  *  <c) 

-  '*1  *  VZ  • 


,  lp{c)y 


"  [(i(c'1)>‘1  j-  t!-1  t'  [(i(c'1))"1J2^(c)2 


(10) 


for  c  =  2,  3 . c*  and 


*(c)  ■  «>,fi  \,e,Tkl.  £  >kwTk2 . £  vc,hJ 

k=l  k=l  k=i 


for  c  -  1,  2,  .  „  ,  ,  c*.  It  might  be  observed  that  the  use  of  Equa¬ 
tions  (9)- (11)  produces  a  simple  iterative  procedure  for  obtaining  a 


#This  estimation  scheme  is  preferable  to 


M 


^~Z  y*  for  i  =  l,2, .  . .  ,  M  .uu  c.  =  1,  2 . c* 

M+i  h 

h--i+l 


?(c)2 


< 


1 

2M+1 


M 

o^C  for  i-M+l ,  M+2, .  .  .  ,  n-M  and  c  =  1 , 2 . c#  , 


n^.  . 

I — : .  ,*  ■  y.  o.  C  '  for  i=n-M+l ,  n-M+2,.  .  n  and  c  =  1 , 2,. .  c* 

'vn+M-i+1  hf-^  h 


which  is  the  suitably  truncated  running  average  of  2M+1  estimators 
*h<c>2  that  was  suggested  by  Ref  1,  when  it  is  reasonable  to  assume 
a  linear  model  for  the  variation  of  and  the  estimation  scheme 

suggested  by  Ref  1  is  preferable,  when  such  an  assumption  is  not 
reasonable. 
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solution  to  the  maximum  likelihood  equations*,  whose  accuracy  may 
be  checked,  for  nonstationary,  uncorrelated  noise  which  is  normally 

2  P 

distributed  with  a  (t)  =  ^  'k^'k^'  ^*°  e8^aklish  an  uPPer  limit  for 

k=l 

P 

ZA  Z 

v  ;T  of  a.  must  be  bound«;d  away 
.  _  1C  Kl  X 

k=l 

from  z^ro  by  a  lower  limit. 

Consider  the  case  of  stationary,  correlated  noise,  and  let  the 
observations  be  taken  at  different  and  equally- spaced  times  with 
ti  =  tQ  +  i  At.  Then  0(t,  t')  =  0(A)  is  a  function  of  only  the  separation 
time  A  =  i  t-t'  I,  0^  =  o(f)  is  a  function  of  only  r  =  I  h-i  I  and 


2  = 


0(0) 

a(l) 

0(2). ..0(n- 1) 

0(1) 

0(0) 

0(1)...  0(n-2) 

0(2) 

0(1) 

0(0)...  o(n-3) 

•  • 

0(n-l) 

0 

• 

0(n- 

•  • 

*  0 

2)  d(n-3)  .  .  .  d( 

An  estimator**  of  *(r)  is 

0(c)(r)  =  —  T  (Y.  -  Y.^l))(Y  -  Y.  (C_1)) 

n-r  "  i  i  1+r  x+r 


(12) 


i  =  l 


for  c  -  1 ,  2,  ,  .  .  ,  c*  and  f-0,  i,  ...,  n-1 


*The  investigation  which  yielded  Equations  (9)-(ll)  was  partially 
prompted  by  the  conjecture  of  Dr.  T.  L.  Gunckel  that  IWLS,  as 
presented  in  Ref  1,  might  provide  \n  iterative  solution  to  the 
maximum  likelihood  equations. 

*:<An  alternative  estimator,  which  takes  the  estimation  of 

$1<  02.  •  •  ■  >  0p  explicitly  into  account,  divides  the  sum  by  n-p-  r 
and  limits  f  to  r  -  0,  1,  ....  n-p-  1. 


12 


It  is  usually  reasonable  to  assume  a  linear  model  for  the  variation  of 
o(A)*-  Suppose  that  TjCA),  To(A)»  •  ••»  Tp(A)  are  known  functions 
of  A,  V\,  v  2.  >  .  *  *  ,  V  p  are  unknown  coefficients  with  p  +  P  <  n  and 


P 

aidk)  s  T  s/kTk(A), 

K"  1 


whose  .simple  version  is 


0<a)  =  2  vx,  l- 

k=l 


If  Tj^  =  Tfc  ((i-1)  At)  for  i  =  1,  2,  . . .  ,  n  and  k  =  1,  2,  . . .  ,  P, 


♦Either  of  these  estimators  may  be  used,  as  is,  when  it  is  not  reasonable  to 
assume  a  linear  model  for  the  variation  of  ®( A). 
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a(c)  = 


iic)(0) 

i(c)<l> 


i(c)(n-l) 


and 


N  =  diag  (n,  n-1,  ....  1), 

then  an  appropriate  estimation  scheme  to  use  in  2  is  given  by 


*(c)  =  (c)  *^(c) 


9  •••  f 


vJC))l=!T'NT)'1T,Na,cl  (13) 

r  — 


and 


!<«>. 


P  P 

X  V  £  V 


y  t  <c)t  y  t  <c,t 

A  k  k2  A  *  kl 
k-1  k  1 


P  P 

Zl  V  £  (c)t 

k  k3  A,Vk  k2 
k  =  l  k=l 


Zt  (C,T 
,  k  k3 
k=l  k=l  k=i 


V  4 

2lv'  t 


k-  1 
P 

X 

k-  1 


k  k2 


*  (c)T 
Vk  Tkl 


P  P  P 

y  y,  (c)t  y^(c)T  yt  (c)t 

k  kn  k  k,n-l  k  k,n-2 


for  c  =  1,  2,  ....  c*. 


y  i  (c,t 

1  k  kn 


k-1 

P 

I 

k=l 

P 


^;,c> 


T 

k  k,n-l 


y  v  (c)T 

A,  k  k,n-i 


k-1 


k-1 


v 

k  kl 


(H) 


The  weight  matrix  N  is  used  in  Equation  (13)  because  it  constitutes  a 
simple  means  of  reflecting  the  increase  of  statistical  dependability 
in^'c)(r)  with  n- r.  It  is  necessary  to  require  that  all  estimators 


« 

V 


of  o’ 


,(c) 


(r)  be  bounded  above  by  the  estimator 
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I 


A  (C) 

y  V  T  °f a  (0).  A  -'■"nmon  engineering  model  for  the 
k=l  k  kl 
variation  of  0(A)  is 


0(A)  “  0O  e 


-  vA 


which  is  nonlinear  in  the  unknown  coefficients  0 g  and  v.  This 
exponential  variation  of  0(A)  may  be  treated*  by  transforming  the 
model  to  the  linear  one, 


loge  <y(A)=loge0o  -  vA  =  VQ  -  v  A  , 


and  proc 
replaced 


eeding  as  above  with  Tjj,  T ft,  Vj,  and  being 

by  loge  0  C  (0»  1.  (»- 1)  At,  Vq  and  -  v,  respectively. 


A  proper  modification  of  Equations  (12)  -  (14)  would  eliminate 
the  restriction  of  observations  to  different  and  equally- spaced  times. 
In  addition,  an  appropriate  combination  of  the  techniques  used  for 
nonstationary,  uncorrelated  noise  and  for  stationary,  correlated 
noise  may  be  employed  in  the  event  of  nonstationary,  correlated 
noise.  The  calculations  would  then  become  more  complicated  and, 
perhaps  in  some  cases,  prohibitive. 

Inversion  of  in  3  must  now  be  accomplished.  If  the  noise  is 

p 

2  *  (cj  .  l 

nonstationary  and  uncorrelated  and  0  (t)  =  ^  )  may 

be  written  in  closed  form  as  k=l 


(X^V1  =diag 


for  c  =  1,  2, . . . ,  c*. 


P  '  P 

y  6  (c)t  y  v  (c)t 

Z-  k  Akl  A  k  k2 
k=l  k-1 


(c).„  ) 

k  A  kn  / 


♦If  preferred,  0q  e  may  be  approximated  by  y  v  Ak  *  and 

k=l 

v\,  Vi, .  . .  ,  vp  may  be  estimated  as  in  Equation  (13);  or  ag  and  v  may 
be  estimated  by  those  (nonlinear)  estimators  that  minimize 

y  (n-i+1)  Jd^(i-l)  -  o^c  ^  Z '  but  are  difficult  to  compute. 
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The  performance  of  this  inversion  has  not  yet  been  investigated  for 
correlated  noise. 


n  « 

A  valid,  yet  simple,  measure*  of  the  change  in  /Vc).  02<c>, . 

^  (c)  _  £  (c-1)  ' 


A  (c) 

p  for  5  is  max 
P 

j~l,2,. .  ,,p 


*  (c) 

Pj 


^Depending  upon  the  application,  max 

i -  1 ,2,. .  .  ,n 


measure  of  this  change. 


*  (c) 


Y  *c_  ^ 


Y  (c) 


may  be  a  more  desirable 
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3.  INCORPORATION  OF  A  PRIORI  INFORMATION 

Let  Pi,  Pj, .  Pp  vary  randomly  over  all  possible  states  of  nature 

with  means  p,’\  P2*.  ...»  Pp*  and  covariance  matrix  F  of  variances 
^jZ  -  Yjj  and  covariances  Vj^.  Introducing  matrix  notation  yields 


P^ 


£ 


* 


and 


vn 

Y12 

•  •  • 

yip 

CM 

>■ 

y22 

•  •  • 

Y2p 

vlp 

•  •  • 

• 

• 

Ypp 

The  a  priori  information,  which  exists  from  previous  analysis,  is  knowledge 
of  p*  and  r. 

Two  quite  reasonable  methods  to  ensure  that  estimators  of  p^,  P 
....  pp  will  be  consistent  with  the  a  priori  information  are:  incorporation 
of  6*  and  r  into  the  quadratic  form  to  be  minimized  by  the  estimators;  and 
employment  of  a  linear  combination  A  jMc*)  +  (Ip  -  A)  p*  (a  special  case  of 
which  is  X£5 +  (1  -  X.)  |3*,  with  A  =  XIp  and  C  <  X  <  l)  as  the  vector  of 
estimators,  where  A  is  a  matrix  which  measures  one's  relative  confidence 
in  the  data  (as  represented  by  p(c*))  and  the  a  priori  information  (as 
represented  by  £*).  Suppose  ~ 

Q*  =  (X-Xp)'  (£(c*))‘1  (X  -  Xp)  +  (P-^)T-1  (p-  p*) 
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is  the  quadratic  form  that  incorporates  ft*  and  r  and  is  to  be  mini- 
mired*.  Then  it  may  be  observed  that  the  corresponding  estimators 
01*-  02* . 0P*  which  incorporate  @T  and  T  are  given  by 


=  <v*  ^  •••*  V1’ 

=  [x,(i(c*V1x  +  r*1  j  ‘1 

[x,(i(c’!<))"1  y  +  r  ” 1  3*] 

=  [x'(i(c  J))‘ 1  x  +  r1]'1 

x*(  i(c  "V 1  x  3^**  +  r'1  3 

=  A  3(c’')  +  c  -  A)  3*  , 

-  p  - 

(16) 


*If  0  <  <  1,  then  two  other  quadratic  forms  that  incorporate  3*  and  T 

and  might  be  minimized  are: 

Q0=  X(Y  -  X§)'  (i(c:;:'Vl  ( Y  -  Xjg) 

+  (l  -  X)(:<3  -  x@*r  (i(cM))'1  (X3  -  x3*). 

with  the  corresponding  estimators  0,  •  •  •  >  /^p©that  incorpor¬ 

ate  3*  and  r  being  yielded  by 

A.,  (0,0.  0^0 . 0.  Ol'  *0<l  1  Ml  -  h  §*;  and 

qA=  i  (v  -  xgr  (i(c'>)'1  (Y  •  x0i  mi  -  am  §  f)  i  r" 1  (0  -  §  ■•), 

with  the  corresponding  estimators  0A0;A . 0PA  Hat  in<  orporate 

and  F  being  yielded  by 

3 A  .... 


A  X  >)  'X 


(i  -hr'1  _1  [a  X'  ii(c '  V1  y  *  (i  -  M  r'1 


On e  such  A  is 

A 


II  •  !  /(II  a  i!  -  II ip  -All) 

A  A 

lor  IIAU  being  a  norm  of  A  (e.  g.  . 


£  &  n-M 
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where  A  is  the  p-by-p  matrix  of  elements  \,k  defined  as 

A  =  [x‘  X  +  r'1]  _1  [x1  (t(c  V1  xj*  (17) 

Note  that  and  X^c*^  may  be  replaced  by  and  1^  if  it  is  desired 

to  incorporate  the  a  priori  information  into  least  squares  rather  than 
IWLS. 

It  might  easily  be  shown  that  &  . would  be  the 

maximum  likelihood  estimators  if  X  and  X(c*)  were  equal  for  both  the 
noise  and  the  coefficients  being  normally  distributed, 

4.  PROPER  ARRANGEMENT  OF  COEFFICIENTS  AND  DATA 

To  denote  the  kth  data  channel,  for  k  =  1,  ?  .  .  .  ,  q,  prefix  a  k  to 

the  subscripts  of  the  previously-introduced  notation  and  obtain,  in  particu¬ 
lar;  xk j(t ),  XW(t).  .  ■  •  «  Xk  (t),  •  ■  ■  ’  ^k^)  ^nd  *k(t), 

Xkji  =  Xkj(tki)  *or  j  =  1 »  2,  ,  pk,  ^kj  =  ^k(tki)  and  fki  -  «kUk^)  ^  ki 

pk  . 

*  X  ^ki  xkji  for  i  =  1,  2,  .  .  .  ,  nk;  and 

j=l 


X.. 

-kj 


lkJ2 


kJ\ 


for  j  -  1,2,  .  .  .  ,  p^, 


(X  ,*  Xi^  v 

kl  ko,  X 


kpi 


). 
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1 


u 

i ' 

ts 

;  j 
|  \ 

!i 

!! 

S  ; 


I 


*kll 

°k!2 

n 

ki\ 

Xkk  = 

ffkl2 

*k22 

ffk2n, 

lr 

■ 

& 

klnk 

Mr 

ffk2i^ 

O, 

knk"k 

The  analogues  of  Equations  (1)  and  (2)  for  the  kth  channel  are  now 


V>  -  L  %  V‘> +  'k“> 

J  =  1 


(18) 


4  ■Z^^+ik*3,kfik+^,"k*'-* . «•  <19) 


)- 1 


Let 


X 


0 

0 


0  0 


X. 


0 

X. 


0 

0 


0  0 


X 


[V 


4 


which  has  tine  effect  of  stacking  the  data  and  coefficients  associated  with 
channels  2,  3,  ....  q  under  the  data  and  coefficients  associated  with 
channel  1.  If  is  the  covariance  between  f |u  and  and  the 

n^-by-nj,  matrix  £  is 


O 

hkll 

ffhki2 

.  .  .  ^hkln, 

k 

X  = 

ffhk21 

ahk22 

ahk2n^ 

hk 

ffhkn,  1 
h 

O 

hkn  2 
h 

.  ,  ,  ^hkn,  n, 

h  k 

the  covariaace  matrix  of  (j., 

*J2’  •*"  ‘in  ’  f21, 

fq2'  • '  • 

'  fq«q  is 

^11 

X12 

X. 

lq 

X  = 

X» 

• 

*22 

X. 

2q 

X' 

X 

qq 

Suppose  that 

..  SkTjr  »« 

H 


t  •  •  •  » 


same  coefficient  appears  in  the  linear  models  for  channels  kj,  k2 
^j.)*  All  but  one  of  them,  for  example  ^5^  1  j  i '  are  suPer^uous  ant* 
should  be  eliminated  from  consideration;  and  the  pertinent  columns  of 
Xkl.  Xj^,  •••»  Xj^  and  Ykj,  Y^,  •..»  Yk  (i.e.,  the  pertinent  data 

from  channels  kj»  k^.  ...»  kr )  should  be  utilized  in  tne  estimation  of 
A, Jr.  This  may  be  accomplished  in  a  simple  (though  not  simple  - 
appearing  J  manner,  by.  deleting  ^  k  2  j  2 9  ^k3j  3*  •••»  @krjr  irom  § '• 

-1 

replacing  column  number  p  +  j  in  X  by  the  sum  of  column 

k^l  k  1 


kr‘ 


V1 


k  - 1 
r 


number  s  X  Pv  +  j,'  I  Fk  +  j-,  ....  ^  p  +  j  in  X  (which  has 

k--l  k=l  k=l 
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the  effect  of  alining*  the  column  of  ,  the  j-^th  column  of  Xk.,,  .  . 
the  jr th  column  of  Xk  under  the  jjth  column  of  in  the  formation  of 


+  jj  1  th  column  of  X);  and  deleting  column  numbers 


k3-i 

k  -1 

Z  pk  +  j- 

i>  2*  Pk 

k=l 

k.=  l 

,  &  k2^2’ 

•  •  .  £krjr  o£her 

cr*e  of  /8k 
focal  point  for  the  alinement. 


After  all  duplications  among  the  /3kj's  have  been  treated  ae  indi¬ 
cated,  the  coefficients  and  data  from  all  channels  are  properly  arranged; 
and  a  properly  arranged  analogue  of  Equation  (2)  is  given,  in  the  same 
form  as  before,  by 


Y  =  X§  + 


(20) 


A  proper  arrangement  of  the  data,  from  all  channels  has  to  be  con¬ 
sidered  in  determining  the  form  of  X  and  in  estimating  its  necessary 
variances  and  covariances  for  IWLS.  In  particular,  the  requirement  of 
nonstationary,  uncorrelated  noise  for  this  proper  arrangement  means 
that  not  only  *ki  anc*  rkj  0*  e  >  observations  of  toe  noise  at  any  two 
times  within  any  channel)  must  be  uncorrelated,  but  also  that  f^j  and 
<kj  (j..  e.  ,  observations  of  the  noise  at  any  two  times  in  any  two  channels) 
must  be  uncorrelated. 


On; 


may  view  the  p<  p,  resulting  columns  of  X  as  p  vectors 


k~- 1 


the 


X;  of  n  =  7  n,  observations  X,,  on  the  p  component  signals  X-(t),  ti 
—  J  f—t  k  J 1  J 

k  =  1 

p  elements  of  J3  as  the  p  coefficients  (3-  ot  X-ft),  the  n  elements  of  Y  as 
n  observations  Y-  on  the*  composite  signal  Y(t),  the  n  elements  of  as 
n  observations  fj  on  the  noise  <  (t ),  the  n^  elements  of  X  as  the  n  vari¬ 
ances  oi  fj  andn^-ncovariances  of  f ^  and  f,  and  Equation  (20) 
as  the  matrix  alter  ego  of  Equation  (i). 


Dr.  J.  C  Pinson  proposed  the  alinement  as  a  simple  way  to  utilize 
the  pertinent  columns  of  Xj^,  X^,  '  '  '  '  ^k- 
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f 

I 

i 

|  5.  PARTITIONING,  SELECTING  AND  APPORTIONING* 

For  exposition  purposes,  it  is  desirable,  to  change  the  number  of 
linearly  independent  component  signals  and  corresponding  unknown 
coefficients  in  the  linear  model  to  m;  to  redesignate  them  by  Zj(t),  Z^t), 
....  Zm(t)  and  a , ,  a^,  ....  om.  and  to  transform  Z^(t),  via  multipli¬ 
cation  by  a  positive  scale  factor,  so  that  the  units  of  Zj^ft)  will  become  the 
units  of  Y(t)  and  a,  will  become  unit-less  for  k  =  1,  2,  .  .  .  ,  m.  This 
transformation  will  be  compensated  for  in  the  technique  described. 

Let  Z^i  -  Zic{t^)  for  i  =  1,  2,  .  .  ,  n  and  k  =  1 ,  2,  m; 

«2*  •  •  •  »  am  vary  randomly  over  the  ensemble  of  all  possible  states  of 

nature  with  means  «i*,  a?*,  .  .  .  ,  a _ *  and  covariance  matrix  F  *  of 

variances  y^*^  =  and  covariances  y^*;  and 


for  k  =  1 ,  2,  .  .  .  ,  m. 


Z  =  2l?/  •  •  •  *  Zm). 


m 


The  suggestion  to  ir/estigate  the  feasibility  of,  and  devise  an  analysis 
for,  such  a  technique  was  made  by  Mr.  H.  J.  Goldfisher  and 
Mr.  L.  H.  Pinson. 
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«=* 


a  * 

1 


a  ♦ 
2 


a  « 

m 


and 


^il55' 

*12* 

y  * 

Mm 

r  #  = 

y12* 

y22* 

y  $ 

y2m 

ylm* 

y2m* 

y  * 

mm 

Then  the  linear  model, 

matrix  alter 

m 

ego  and  the 

m 

a  priori  information  become  I  «  Z  (t),  ^  a  Z(  Z  a  and  a*  and  F  * 


k  - 1 


k-  1 


When  m  is  large  and/or  some  of  Zj{t),  Z^ft),  .  .  .  ,Zm(t)  are  highly- 
rel  tec  consideration  of  accuracy  and  ease  of  computation  frequently 
vn 

dictates  that  /  a  Z  (t)  be  replaced  by  a  new  linear  model  containing 

iTi  k  k 

only  representative  component  signals  which  are  highly -descriptive,  but 
not  highly  -  related.  Suppose  that  the  p  <  m  representative  component 
signals,  whose  ur.ds  are  the  units  of  Y(t),  and  corresponding  unit -less 
unknown  coefficients  are  Xj(t),  X  (t),  .  .  ,  X  (t)  and 

Xj*  -  Xj(tj)  for  i  -  1,  2,  .  .  .  ,  n  and  j  -  1 ,  2,  .r>  .  ,  p;  £j,  02 ,  -  /8 

vary  randomly  over  the  ensemble  of  all  possible  states  of  nature  with 
means  /3 j *,  .  .  .  ,  /3p:'-  and  covariance  matrix  F  of  variances 
y  *-  -  y  j  and  covariances  y^j;  and 
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11 

y  12 

...  yiP 

12 

Yll 

.  *2P 

IP 

yZp 

ypp 

Hence  the  new  linear  model,  its  matrix  alter  ego  and  the  corresponding 


a  priori  information  are  }  fi.X.(t),  7  8  X .  =  X  B  and  8*  and  f 

Pi  !  J  pi  “ 


m  P 

One  manner  of  replacing  a,  Z.  (t)  by  y  /S.X.(t), 

k^l  k  K  Pf  J  J 


and  in  particular 


Za  by  X/3,  is  to  partition  Zj(t'  Z^(  t ),  .  .  .  ,  Zm(t)  im  subsets  Sj , 

,  ....  Sp  of  highly -related  Z^.(t);s  and  to  select  an  Xj(t)  to  represent 
S;  (i.  e.  ,  each  Z>(<{t)  in  Sj )  for  j  -  1,  2.  ....  p.  In  addition,  it  is  desir¬ 
able  to  be  able  to  apportion  an  estimator  of  each  /8j  among  the  a^’s 
which  correspond  to  the  Zp(t)'s  in  Sj. 


A  measure  of  the  degree  of  linear  dependence  Detween  Z^(t)  and 
Z^(t)is  needed  to  accomplish  this  partitioning.  One  such  measure, 
based  upon  Z^  and  Z,^,  is  the  cosine 


and 
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for  k  =  1 ,  2,  m, 


then  the  sample  correlation  coefficient 


l  k 

y  (z.  -  z. )  (z  -  z 

'  ni  h  ki  k 


r ,  =  r 

hk  kh 


^  .2i 


V  (2  -  z  Y  (z.  -  Z  )‘ 

'  hi  h'  ki  k; 


ki  k' 


_ (h'h] _ 

-lh»IK--?k,,<4-?k,lli 


between  Zg  and  Z^  is  the  cosine  oi  the  angle  between  Zj  -  Z  q  and 

Z'n  -  Zg  and  also  a  measui  e  ot  the  degree  oi  linear  dependence  between 

Zj  (t )  and  Zj.(t).  Observe  tnat  the  multi  pin  at  ion  ol  Zj4(t )  by  <-'h  and 

Zg(t)  by  cK  does  run  change  the  magnitude  of  dgg  and  rh^  and  changes  the 

sign  of  dvpK  and  rhg  only  for  c>.  and  eg  having  different  signs.  It  might  be 

e.em.—.st  rate.:  that  any  three  of  the  dug's  and  any  three  of  the  rgg’s  satisfy 

the  inequalities 


-  1  g  0[V  ft;  . 


|f  1  dhfKi  *  dki2  >  | 1  dhk  <  dhi  dki 


Ml  dhd-)(l  ■  dko )  <  l 


:  ^  2  j  l  2 

(  1  -  r  ,  t  ( <  r  ,  ) 


$  ‘  h  k  -  h  i  ‘  k ; 


!IS  -  M<i  ■  u.2> 
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Consequently,  it  is  feasible  to  utilize  either  or  rij^  in  a  partitioning 
procedure  for  obtaining  Sj,  $2 .  .  .  .  ,  Sp  from  Zj(t),  Z£  ( t ),  ....  Zm(t). 
The  pertinent  one  to  use  is  d^k  when  a  constant  term  is  not  ncludtd  in 
the  linear  model  and  r^k  when  a  constant  term  is  included  in  the  linear 
model.  Without  ioss  of  generality,  dj^k  will  be  used  in  the  text;  but:  it 
should  be  replaced  bv  r^k  where  appropriate. 


A  reasonable  partitioning  procedure  should  not  be  affected  by  a 
renumbering  of  Z^ft),  Z^ft),  .  .  .  ,  Zm(t).  The  procedure  also  ought  to 
yield  Sj,  S2,  ...  Sp  with  a  relatively  high  degree  of  linear  dependence 
existing  between  any  two  Zj^.( t ) ' a  in  the  same  and  a  relatively  low- 
degree  of  linear  dependence  existing  between  any  two  Z^(:)'s  in  different 
Sj's  Finally,  it  should  provide  some  control  over  the  size  of  p 

$ 

The  partitioning  procedure  employed  by  E1WLS. 

1.  Computes 


k-  i  m  m 

d„  -  X  Idhk!  ♦  X  |dhk|  =  X  I  ahk !  - 1  U7) 

h=I  hT*l  h=l 

for  k  “  1 ,  2 ,  .  .  .  ,  m  ■ 

2.  Selects  that  Z,  ftl,  sav  7,i,  it),  whose  d,  is  smallest  , 

k  K !  k 

3.  Puts  Zg(t}  into  if  and  only  if  j  dkk  [j  ^  d,  where  cos  45° 

0.  70711  <  d  <  i  is  a  preassigned  constant  which  should  be 
selected  to  reflect  the  desire  for  the  existence  of  a  relatively 
nigh  degree  of  linear  dependence  between  any  two  Z^(t)  s  in 
the  same  S.  and  a  relatively  low  degree  of  linear  dependence 
between  any  two  Zg(t)'s  in  different  Sj  '»  and  the  desire  for  a 
relatively  small  p. 


4 


5 


ielects  that  Zg(t)not  in  Sj  ,  say  Zj^(t),  whose  d^  is  smallest 
unong  the  Zk(t)'s  not  n  Sj  . 

•'or  Zg(t)  r-'t  in  Sj,  puts  it  into  5 2  if  and  only  if  j  dkk,  j  >  i, 

md  for  Zg{  t )  in  S  j ,  rriiovfti!  from  Si  and  puts  it  into  Sj  if 
md  only  if  |  dkk^  |  >  |  ,  j  . 


Mr.  P 


L  Hsu  developed  the  details  of  this  procedure 
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6.  Continues  in  the  same  manner  and  selects  that  Zk(t)  no?  in 

sl»  S2,  Sj_2  or  Sj^j,  say  Zj^t),  whose  dk  is  smallest 

among  the  Zk{t)'s  not  in  Sj,  S2,  ....  Sj„2  or  Sj  . 

7.  For  Zk(t)  not  in  Sj.  S2,  ....  Sj_2  or  Sj.j,  puts  it  into  Sj  if 

and  only  if  |  d^.  I  >  d;  and  for  Zk(t)  in  Sj,  S2,  .  . .  ,  Sj_2  or 
Sj-i.  removes  it  from  Sj,  S2,  . .  .  ,  Sj.2  or  Sj.j  and  puts  it 
into  Sj  if  and  only  if  |  d^  |>  max  |  dj^  | 

■  h= 1 i 2) . • . ,  j  —  1 

8.  Continues  in  the  same  manner  until  selecting  a  Zk(t)  not  in 
Sf#  S2,  ....  Sp_2  or  Sp_j,  say  Zkp(t),for  which  |  dkk^  j  >  d 

for  all  the  Zk(t)’s  not  in  Sj,  S2,  ....  Sp_2  or  Sp_j  and, 
thereby,  defining  p  . 


9. 


For  Zk(t)  not  in  Sjt  S2,  ...»  Sp^  or  Sp.j,  puts  it  into  Spj 
and  for  Zk(t)  in  Si,  S2,  ....  Sp_2  or  Sp»i,  removes  it  from 
Sj,  S2.  ....  Sp_2  or  Sp_ j  and  puts  it  into  Sp  if  and  only  if 


>  max 
j=l,  2, . . . 


10.  Inspects  the  resulting  Sj,  S2,  ...»  Sp  to  determine  if  |  dj^  | 
is  sufficiently  large  for  all  Zh(t)  and  Zk(t)  in  the  same  Sj,  if 

I  dhk  |  is  sufficiently  small  for  all  Zh(t)  and  Zk(t)  in  different 
Sj's  and  if  p  is  sufficiently  small  . 

11.  Modifies  Sj,  S2i  .  .  .  ,  Sp  and  d  until  10  is  satisfied  to  a 
reasonable  extent. 


It  is  notationally  convenient  to  define  0  =  mQ  <  mtj  <  m2  <  ,  . . 

<mp-l  <  mp  =  m  and  renumber  Zj(t),  Z2(t),  ...,  Zm(t)  with  Zm  +l(t), 

zmj.j+2(t)#  •  •  • »  Zmj(t)  being  in  Sj  for  j  =  1,  2,  . .  .  ,  p.  Using 
Equation  (25),  it  might  also  be  demonstrated  that  each  Zk(t)  in  Sj  may 
be  transformed,  via  multiplication  by  dmjk/|  dm^k  |,  so  that  dhk  becomes 

positive  when  Zh(t)  and  Zk(t)  are  in  Sj  for  j  =  1,  2,  .  .  . ,  p.  This  trans¬ 
formation  will  also  be  compensated  for  in  the  ensuing  discussion. 

m  P 

The  replacement  of  «kzk(t)  by  j3^(t)  means  that  X^t), 

X2(t),  .  .  . ,  Xp(t)  should  be  selected  to  satisfy 
P  m 

Z  0jX;<t)  =  2  akZk<t)-  (28) 

j=l  “  k= 1 
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In  addition,  the  representation  of  Sj  by  X;(t)  suggests  that  Xj(t)  should 
be  selected  to  satisfy 


/3.X  (t) 

J  J 


X;(t) 


m. 


k=m,  +1 

J-l 


vkzk(t)> 


(29) 


v  01 


where  vmj_j+i,  vmj_  j+2,  ....  vm.  are  non-negative  weights  whose 

sum  is  one,  for  j  =  1 ,  2,  .  .  .  ,  p.  Simple  analytic  consequences  of  the 
validity  of  Equations  (29)  and  (30)  for  all  values  of  t  are 

a,  =  v,  8.  for  k  =  m.  +1,  m.  +2,  .  .  .  ,  m.  and  (31) 

k  k  j  j-1  j-l  J 


Tj  . 

J 

«3ki'or  j  =  1,  2 . p.  (32) 

k^rnT- , +1 

J~1 


Regardless  of  the  true  relationships  (or  lack  of  them)  among 
om.  ,+i»  °m-  ,+2 >  •  •  •  >  am  >  the  selecting  of  X:(t)  to  -present  S: 

(i.  e.  ,  Equations  (29)  and  (30))  induces  analytic  relationships  among 

arn .  i+1,  am .  1+£,  .  .  ,  am.  and  p.  h.  e.  ,  Equations  (31)  and  (32)) 

1  J  3 

which  cause  them  to  become  analytically  and  probabilistically 
indistinguishable  (i.  e.  ,  yield  all  "»f  them  once  any  one  of  them  is 
determined)  for  j-l,  2,  ....  p.  The  selecting,  in  turn,  induces  a 
web  of  analytic  relationships,  which  are  not  difficult  to  derive,  into 
the  structure  of  the  a  priori  information: 


a,  *  -  v,  (3.*  for  k  -  m.  + 1 ,  m .  +2 ,  ....  rn .  ; 

k  k  j  j-l  j-l  j 


rn . 


J 

I 

k-ni^  ,  +1 
J-  1 


a  * 

k 


(33) 


(34) 
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(35) 


v  *  -• 

y  .  * 

t. 

=  v, 

y .  f  o  r 

k  - 

m .  + 1 , 

,  m.  +2 

'  k 

kk 

k 

J 

J-1 

J-1 

V/  ■ 

v.  v 
n  k 

2 

yj  = 

v  *  V  * 
n  k 

:  for 

k  =  h+1 

,  h+2, . .  , 

h  = 

m .  , 

+ 1 ,  m 

■  i+2> 

m .  -  1  ; 

and 

J-1 

J 

.  .  ,  rn .  ; 
J 


.  m  and 
J 


(36) 


m . 


J 

V  y,  *  j  for  j  -  1,  2. 
ik=r«r  +i  k 


(37) 


Consideration  of  this  structure  oroduces* 


m . 


V,  =  V,  *  /  ^  y.  *  =  V,  */v.  for 

k  ' k  ,  _  ,,  h  k  '  j 

h=m .  +1  J 

J-1 


k--  m .  ti,  m  .  .  +2 ,  ....  m  .  and  j  =  1 ,  2 , 
J-1  .1  -  J 


P 


(38) 


as  tiie  natural  set**  of  weights  to  use  in  Equations  (30)  and  (31).  It 
follows  immediately  that 


ak  "  ak*  & :  -  &* 


v  * 

yk 


Yi 

J 


and  so  and  0;  occur  with  equal  probability  when  a^,  hence  /3j,  is 
normally  distributed  for  k  -  mj_  j+1  p  rnj.  j  +  2,  ....  mj  and  j  =  1 , 

2 ,  .  .  .  ,  p . 


•-These  weights  were  proposed  by  M  .  C.  M.  Shipplett. 
**An  alternative  set,  which  assigns  equal  weight  to  Zrf) , 


’j-l 


Zmj .  j+2*  Zmj  in  Equation  (30)  and  to  am  _  J  +  1 ,  a  j  +2,  .  ... 

f  /Tit  ■  _  1  r  »  .  « 


<*m .  in  Equation  (31),  is  given  by  v. 


,  for  k  -  m ,  /I 

K  m .  -  m .  .  j  -  1 


j  j-1 


m .  +  2 ,  .  .  .  ,  m .  and  j  ■  1 ,  2 ,  .  .  .  ,  p. 

J-1  J 


33 


Hence,  the  selecting  of  Xjft),  X2(t),  ....  Xp{t)  from  Z](t), 
Z^t),  .  .  .  ,  Zm(t)  is  obtained  by 


m . 


x.(t)  =  y 

k=m.  +11 


Zk(t)  for  j  =  (39) 


the  selecting  of  Xj,  ....  Xp  from  _Z  | ,  Z^>{  ■  •  ■  <  Zjv  *a  accom¬ 

plished  by 


m . 


X..  = 
Ji 


k=m.  +  lj 

J-l 


m . 


kh  =m .  +1 

j-  1 


Z,  .  for 
ki 


i=  1,  2, 


,  n  ana  j 


2, 


{40) 


in  particular;  a 

ad  the 

a 

apportioning  of 

an  estimator 

(e 

among  om._I  +  1 

1  ®mj_  j+2’  *  • 

•  J  fljr 

§  for 

k  / 

m.  \ 

) 

k  =  m .  +1 ,  rn  +2,  .  .  .  ,  m  .  and  i  -  1 ,  2, 

J-l  J-l  J 


\  - 


A) 


The  combination  of  the  two  previous  transformations  to  each  Z^{t)  in 
Sj  (i,  e.  ,  »ts  multiplication  by  the  product  of  the  original  d^.^/l  dm.^J 
and  the  corresponding  positive  scale  factor)  may  now  be  ^  ■* 

compensated  for  by  performing  the  same  combination  of  transforma¬ 
tions  to  the  corresponding  hy  (i,  e,  ,  its  multiplication  by  the  product 
of  th«  original  dm.k/jdm.k,  and  the  corre.ponding  po.itive  .dale 

factor).  Indeed,  it  might  be  proved  that  these  transformations  may 
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^  bo  eliminated  by  replacing  vk  in  Equations  (30),  (31),  (33),  (35),  (36),  (39). 

(40)  and  (41)  with  the  product  of  the  original  dm.k/|dm .k|,  the  corresponding 
positive  scale  factor  and  vk.  J  J 

6.  DISCUSSION 

A  procedure  has  been  described  for  estimation  of  the  coefficients  for 
linearly  independent  component  signals  in  a  linear  model  from  observations 
on  the  component  signals  and  a  composite  signal  containing  the  linear  model 
plus  noise  which  is  nonstationary  and/or  correlated  with  unknown  covariance 
matrix,  when  the  coefficients  may  vary  randomly  over  all  possible  states  of 
nature  and  they  may  appear  in  the  linear  models  for  more  than  one  data 
channel,  and  when  there  may  be  a  large  number  of  component  signals  in  the 
linear  model  and  some  of  them  may  be  highly- related.  The  procedure, 
denoted  by  EIWLS:  properly  arranges  the  coefficients  and  data  from  ail 
channels;  partitions  the  resulting  set  of  component  signals  into  subsets  of 
highly- related  ones  and  selects  a  representative  component  signal  from  each 
subset  to  include  in  the  new  linear  model;  obtains  the  least  squares  estima¬ 
tors  of  the  coefficients  in  the  new  linear  model,  and  then  iteratively  calcu¬ 
lates  an  estimator  of  the  noise  covariance  matrix  and  obtains  the  weighted 
l^ast  squares  estimators  of  these  coefficients  which  are  determined  by  the 
inverse  of  the  matrix  estimator;  incorporates  the  a  priori  information  into 
estimators  of  the  coefficients  in  the  new  linear  model;  and  apportions  an 
estimator  for  the  coefficient  of  the  representative  component  signal  for  a 
subset  among  the  coefficients  of  component  signals  in  that  subset. 

During  the  development  of  EIWLS,  the  difficulty  of  the  generalized 
statistical  regression  problem  and  the  requirement  for  an  operational  analysis 
within  a  reasonable  period  of  time  combined  to  dictate  that  the  security  of 
rigorous  theoretical  proof  be  occasionally  sacrificed  for  the  expedience  of 
apparent  theoretical  implication  plus  intuitive  justification.  The  computer 
provided  a  feasible  means  to  implement  the  procedure  and  to  empirically 
test  it  on  examples,  when  the  noise  is  nonstationary  and  uncorrelated. 
Empirical  testing  of  the  procedure  and  its  parts  has  included  not  only  the 
analysis  of  fabricated  examples  (e.  g. ,  those  in  Section  8  and  Ref  1),  but 
also  the  successful  evaluation  of  inertial  navigation  systems  via  the  analysis 
of  real  field-test  data.  Therefore,  EIWLS  is  presented  as  a  theoretically 
promising,  intuitively  appealing  and  empirically  tested  (to  a  limited  extent) 
solution  for  the  generalized  statistical  regression  problem. 
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The  feasibility  of  XWL5  depends  upon  the  practical  inversion  of 
fh®  estimators  of  the  noise  covariance  matrix  X.  This  inversion 

has  to  be  investigated  in  the  case  of  correlated  noise.  When  tiie 
noise  i»  stationary  and  correlated  and  the  observation*  are  taken  at 
different  and  equally- spaced  times,  the  form  of  *®  particularly 
simple  and  inversion  may  not  be  too  difficult.  A.  simplifying  assump¬ 
tion,  which  tenable  in  a  large  number  of  scientific  problems,  is 
that  the  noise  covariance  <?(?,  t')  becomes  zero  as  the  time  difference 
I  t>t'  I  increases  beyond  a  certain  limit.  Then  X^  has  blocks  of  zero 
elements  and  inversion  by  partitioning  may  be  practical. 

Consider  the  IWLS  estimator  a  ^\^c\  /§2^»  *  ♦  •  »  /§p^  of  the 
coefficients  82,  ...»  for  c  =  1,  2,  . .  . ,  c*.  In  order  to  pro¬ 
vide  estimators  of  8 2 ,  . .  .  ,  /3p  of  sufficient  stability  for  a 

firticulax  application,  pj»c),  P2^ *  •  »  Pp  C  neec*  not  converge  to 

pzi0*),  .  „  ,  ,  Pp^c*^,  but  only  possess  a  valid  measure  of 

change  (e.  g„  ,  max  j  -  /§.(c*l))/jjMc)  j  )  which  becomes 

jst,2,.  .  .,p  3  3 

less  than  a  particular  preassigned  constant  for  c  =  cQ,  c0+i, .  .  .  ,  c*. 

Suppose  that  f  ($j^  -  |  S  bj  ^or  c  =  cj»  cj  +  l»  •  •  • »  c*- 

Then  •  •  *  »  appear  to  be  approaching  or  oscil¬ 

lating  in  a  band  (whose  width  depends  upon  bj)  aKoui:  an  asymptotic 
value  and  could  probably  be  combined,  by  some  technique  (e.  g.  , 
extrapolation),  to  yield  an  improved  estimator  of  $j.  This  improved 
estimation  of  should  be  investigated.  A  measure  of  the  inherent 
accuracy  in  the  estimation  of  /Sj,  which  depends  upon  the  matrix  X  of 
component  signal  observations  and  the  vector  Y_  of  composite  signal 
observations  (and,  implicitly,  the  vector  £  of  noise  observations)  is 
provided  by  fcj  and  cj.  Note  that  there  may  be  a  considerable  varia¬ 
tion  among  the  inherent  accuracy  in  the  estimation  of  /8j,  8z»  •  •  •  *  0p- 

At  each  iteration,  the  guiding  principle  of  IWLS  is  to  provide  the 
optimum  estimators  of  8z>  •  •  •  •  ftp  an<*  f^en  based  upon  the 
available  information  at  that  iteration  concerning  X  and  then 
($2,,  ....  /3p.  This  principle  is  similar  to  Bellman's  principle  of 
optimality  in  dynamic  programming  (Ref  8,  page  83),  which  states 
that  "an  optimal  policy  has  the  property  that  whatever  the  initial 
state  and  initial  decision  are,  the  remaining  decisions  must  con¬ 
stitute  an  optimal  policy  with  regard  to  the  state  resulting  from  the 
first  decision."  In  addition,  IWLS  extracts  considerably  more  infor¬ 
mation  from  the  data  than  does  least  squares  and  may  be  termed  an 
"estimation  servomechanism"  or  an  "adaptive  estimation  procedure." 
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It  is  not  at  all  unreasonable  to  conjecture  that  IWLS  will  produce, 
except  perhaps  under  pathological  conditions,  near-optimum  esti¬ 
mators  of  /Si,  •••  *  0p  and  21 »  However,  the  stability  and 

estimation  properties  of  IWLS  should  be  further  investigated,  both 
theoretically  and  empirically. 


It  is  informative  to  elaborate  upon  two  previously  mentioned 
characteristics  of  the  optimum  estimators  for  0j,  3 2 »  •  •  •  »  0p»  The 
first  characteristic  is  that  they  minimize  the  appropriate  statistical 

di  stance  !Qil  between  the  data  vector  Y  and  the  estimator  of  the 
matrix  alter  ego  for  the  linear  model  X0.  One  has  no  knowledge 

P 

whatsoever  of  the  difference  between  the  linear  model  /  0.X  ..  at 

fit  1  11 

time  ti  a™d  its  resulting  estimator  or  of  the  difference  between  /3j  and 
its  estimator,  but  only  knowledge  oi  the  probabilistic  properties  of 
these  differences.  The  second  characteristic  is  that  for  the  optimum 
estimators  of  0j,  02,  .  .  .  ,  0p,  each  of  these  differences  has  mean 
zero  and  miumum  variance  among  all  corresponding  differences 
resulting  from  unbiased  linear  estimators  0j,  £2*  •  »  •  »  /Sp.  In 
other  words,  the  optimum  estimators  of  0j,  02*  •  •  •  *  0p  do  not 
always  produce  an  estimator  which  is  closest  to  the  quantity  being 
estimated;  but  they  do  produce  an  estimator  whose  probability  dis¬ 
tribution  is  more  tightly  spread  about  this  quantity  than  the  proba¬ 
bility  distribution  of  any  estimator  resulting  from  unbiased  linear 
estimators  (e.  g.  ,  the  least;  squares  estimators)  of  0j,  02,  ...»  0p. 

The  three  sets  of  estimators  of  0j,  02,  .  .  .  ,  0p  that  incorporate 
the  a  priori  information,  one  set  given  by  Equation  (16)  and  the  other 
two  sets  Hated  in  the  footnote  on  page  18,  have  been  empirically 
tested  on  an  example  for  which  0j,  0?,  ...»  0p  had  mean  zero  and 
covariance  matrix  /  '  -  diag  (yi^,  y2^i  ...  >  yp^)  variances  y  j 
>'2^,  ...»  yp^  that  varied.  Although  this  testing  provided  empirical 
justification  for  the  use  of  .equation  (16),  the  results  were  somewhat 
inconclusive;  since  the  version  of  the  scalar  weight  presented  in  that 
footnote  was  too  insensitive  to  the  variation  in  yj  ,  yz^>  •  .  •  *  and 
required  an  extreme  variation  in  them  to  itself  vary  from  zero  to  one. 
Additional  empirical  testing  of  the  three  sets  of  estimators  that 
incorporate  the  a  priori  information,  utilizing  a  more  satisfactory 
version  of  $  ,  is  needed.  The  improved  estimators  of  0j,  02*  •  •  •  » 

0p  and  the  appropriately  modified  version  of  rather  than 

02^c  ' *  -  and  should  be  used  to  calculate 

these  estimators. 
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[ 

[  "  Since  the  coefficients  Q'mj- 1  +1  ■  flfmj_i+2.  ....  amj  °f  the  component 

I  signals  in  the  subset  Sj  become  analytically  and  probabilistically  indis¬ 

tinguishable  from  the  coefficient  Pj  of  the  representative  component  signal  Xj 
for  Sj  for  j  =  1,  2,  ....  p  during  selecting  and  apportioning,  the  procedure 
usedJto  partition  the  component  signals  Z\(  t),  Z2(t),  ....  Zm(t)  into  subsets 
Sj,  Sj>,  .  .  Sp  is  very  important.  Hence,  the  possibility  of  improving  upon 
the  partitioning  procedure  employed  by  EIWLS  ought  to  be  investigated.  One 
approach  would  be  to  define  a  measure  of  the  quality  of  a  partition  and  to 
choose  that  procedure  which  maximized  this  measure. 


7.  SURVEY  OF  RELATED  LITERATURE 

In  order  to  place  EIWLS  in  proper  perspective,  a  survey  of  related 
literature  is  now  presented.  This  literature  is  grouped,  for  convenience, 
into  seven  categories  of  articles:  those  treating  the  properties  of  the  least 
squares  estimators,  a  set  of  weighted  least  squares  estimators  or  the 
optimum  estimators  (Ref  9-16);  those  treating  estimation  of  residuals  or 
iterative  estimation  procedures  (Ref  17-26);  those  treating  estimation  pro¬ 
cedures  for  correlated  noise  (Ref  27-30);  those  treating  estimation  proce¬ 
dures  which  incorporate  a  priori  information  (Ref  31-36);  and  those  treating 
procedures  for  replacing  the  linear  model  by  a  new  linear  model  and 
estimating  the  coefficients  in  the  new’  linear  model  (Ref  4,  21,  34,  35,  37, 
38,  39  and  40). 


Grenander  and  Rosenblatt  (Ref  9,  Sections  7.  1  -  7.4)  derive  important 
asymptotic  properties  of  the  least  squares  estimators  and  the  optimum 
estimators  of  the  coefficients  Pi,  P2»  •  •  •  >  Pp  when  the  noise  is  stationary. 
The  relationships  among  the  columns  of  the  component  signal  matrix  X  and 
the  eigenvectors  of  the  noise  covariance  matrix  £  and  the  conditions  on  the 
eigenvalues  of  E,  required  for  the  least  squares  estimators  and  the  optimum 

estimators  of  Pj,  P2 . Pp  to  be  equal,  are  obtained  by  Muller  and  Watson 

(Ref  10).  Using  the  eigenvalues  of  certain  matrices,  Magness  and  McGuire 
(Ref  11)  compare  the  covariance  matrices  of  the  least  squares  estimators 
and  the  optimum  estimators  of  Pi,  P2.  ....  Ppl  and  Golub  (Ref  12)  compares 
the  covariance  matrices  of  a  set  of  weighted  least  squares  estimators  and 
the  optimum  estimators  of  Pi,  P2 . Pp. 
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Zyskind  (Ref  13)  and  Watson  (Ref  14)  discuss  the  estimation  problem  in 
*  general;  and  necessary  and  sufficient  conditions  for  the  least  squares  and 
the  optimum  estimators  of  ,  P2.  •  •  •»  Pp  to  be  equal,  or  effectively  equal, 
in  particular.  The  contribution  of  errors  in  estimating  weights  to  the  vari¬ 
ances  of  Pi,  P2 . Pp,  for  nonstationary  and  uncorrelated  noise  and  for  a 

special  case  of  correlated  noise,  is  treated  by  Williams  (Ref  15).  Both 
Williams  (Ref  15)  and  McElroy  (Ref  16)  state  necessary  and  sufficient  condi¬ 
tions  for  the  least  squares  and  the  optimum  estimators  of  Pj  ,  p£,  .  .  . ,  Pp  to  be 
equal.  The  above  asymptotic  properties,  relationships,  conditions  and 
comparisons  should  be  useful  in  the  investigation  of  procedures  such  as 
IWLS. 


Material  relevant  to  the  estimation  of  regression  residuals,  and 
thereby  £,  is  presented  by  Thiel  (Ref  17  and  18)  and  Koerts  (Ref  19).  When 
the  noise  is  nonstationary  and  uncorrelated  with  variance 


<r2(t) 


P 

E  wt) 

j=i  J  j 


Prais  and  Aitchison  (Ref  20)  propose  an 


iterative  estimation  procedure  that  is  IWLS  with  the  appropriate  estimator 

of  the  noise  covariance  matrix  H;  and  Fisher  (Ref  21)  proposes  two 
iterative  solutions  (modified  forms  of  Newton's  method  of  approximation)  to 
the  maximum  likelihood  equations  resulting  from  normally  distributed  noise. 
Also  analogous  to  IWLS  is  an  iterative  estimation  procedure  described  by 
Turner,  Monroe  and  Lucas  (Ref  22)  for  the  case  of  the  linear  model  being 
replaced  by  the  quotient  of  two  polynomials  in  the  component  signal  X(t)  and 
the  noise  being  stationary  and  uncorrelated.  Mandel  (Ref  23)  develops  an 
iterative  estimation  procedure,  closely  related  to  IWLS,  to  treat  the  linear 
model  Pj  +  P2?(l)  with  nonstationary,  uncorrelated  noise.  The  convergence 
properties  of  p  are  treated  by  Telser  (Ref  24). 


Iterative  estimation  procedures  which  iterate  over  time,  combining 
past  data  and  estimators  with  new  data,  have  appeared  for  some  time  in  the 
engineering  literature.  Two  early  examples  of  such  procedures  are  con¬ 
tained  in  Ref  25  and  26. 
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An  excellent  survey  of  articles  concerned  with  estimation  (and  hypothe¬ 
sis  testing)  procedures  for  correlated  noise  is  presented  by  Anderson 
(Ref  27),  In  the  event  of  a  linear  model  Pj  +  P2X(t)  and  stationary,  corre¬ 
lated  noise  that  is  normally  distributed  with 


*2 
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Murthy  (Ref  28)  discusses  an  iterative  solution  to  the  maximum  likelihood 
equations  and  an  explicit  criterion  for  its  convergence.  Goodman  (Ref  29) 
suggests  a  noniterative  procedure,  which  performs  the  estimation  in  the 
frequency  domain  rather  than  in  the  time  domain,  when  the  noise  is  station¬ 
ary  and  correlated.  The  estimation  problem,  with  a  system  of  linear  models 
for  several  data  channels  and  noise  correlated  over  both  channels  and  time, 
is  covered  by  Parks  (Ref  30). 


From  two  different  points  of  view,  Raiffa  and  Schlaifer  (Ref  31, 
Sectionsl3.2  -  13.  7)  and  Theil  (Ref  32)  cover  the  incorporation  of  a  priori 
information  into  the  estimation  procedure  for  stationary,  uncorrelated  noise 
which  is  normally  distributed  with  variance  tr2  either  known  or  unknown. 
Gunckel  (Ref  33)  incorporates^  a  priori  information  into  estimators  that 
reduce  to  the  estimators  Pj*,  p2*,  ....  *  of  Pj,  p2>  ,4.  . ,  0  in  Equation(l6) 

when  the  noise  has  (mean  zero  and)  covariance  matrix  e(c*J.  The  incorpora¬ 
tion  of  a  priori  information  described  by  Drucker  (Ref  34)  is  in  essentially 
the  same  form  as  J3*  -  (Pi*.  Pg*.  •••»*Pp*)'*  Chipman  JRef  35)^  presents 
material  concerning  the  properties  of  P5*  (plus  perhaps  |3©  and  pik),  as  well 
as  to  partitioning,  selecting  and  apportioning.  Judge  and  Takayama  (Ref  36) 
treat  inequality  restrictions,  for  incorporating  a  priori  information  into  the 
estimation  procedure. 
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Efroymson  (Ref  4)  proposes  a  stepwise  selection  of  representative 
component  signals  via  selection  of:  the  best  component  signal  Zk(t)  to  use  as 
the  representative  component  signal  X j (t )  in  the  linear  model  PjXjit);  the 
best  /.k(t)  to  use  with  X,(t)  as  the  representative  component  signal  X^ (t )  in 
the  linear  model  Pi X  j  (t )  +  p2X2(t);  .  .  .  ;  and  the  best  ZR(t)  to  use  with 
Xi(t),  X2(t),  .  ...  Xp_j(t)  as  the  representative  component  signal  Xp(l)  in 


4 

the  linear  model  p.X.(t).  A  selection  of  representative  component  signals 

j  =  l  J  J 

via  an  iterative  nomination,  of  the  best  subset  of  p-p'  Zk(t)'s  to  use  with  a 
subset  X j (t ),  X2(t),  ....  Xpi(t)ofp'  Zk(t)'s  as  the  representative  component 
signals  Xp,  +  j  (t ),  X 


P'  + 


P 

2  (A 


Xp(t)  in  the  linear 


P 


that  subset  Xpi  + 1  (t ),  Xpt+2(t),  Xp(t)  of  p-p'  Zk(t)'s  as  the  representative 

component  signals  X1(t),X2(t) . X,(t)  in  the  linear 

P  P 

model  p.X  (t)  until  two  subsets  renominate  each  other,  is  suggested 

j=  1  3  3 

by  Villone,  McCornack  and  Wood  (Ref  3  7).  Both  of  these  procedures  simul¬ 
taneously  provide  least  squares  estimators  of  Pj,  p2,  ,  f}p.  However, 
the  two  procedures  which  are  most  similar  to  partitioning,  selecting  and 
apportioning  are  the  procedure  for  approximating  by  representative  compo¬ 
nent  signals  of  Fisher  (Ref  21)  and  the  procedure  for  grouping,  selecting  and 
apportioning  of  Druckcr  (Ref  34).  Massy  (Ref  38),  Fortier  and  Solomon 
(Ref  39)  and  King  (Ref  40)  also  are  related  to  portioning,  selecting  and 
apportioning. 


With  the  exception  of  Ref  4  and  37,  this  related  literature  became 
known  to  the  author  only  after  the  development  of  EIWLS.  One  may,  never¬ 
theless,  note  the  anticipation  of:  the  iterative  estimation  procedures  of  Ref  1 
(IWLS)  and  the  subsequent  Ref  23,  by  Ref  20-22  and  28;  the  incorporation  of  a 
priori  information  into  the  estimation  procedures  of  Ref  1  (EIWLS  in  sum¬ 
marized  form)  and  the  subsequent  Ref  34,  by  Ref  31-33;  and  the  procedure 
for  partitioning,  selecting  and  apportioning  of  Ref  1  (EIWLS  in  summarized 
form)  and  the  subsequent  Ref  34,  by  Ref  21. 
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8. 


ILLUSTRATIVE  EXAMPLE* 


An  illustrative  example  concludes  the  paper.  Although  the  example  is 
described  in  detail  by  the  paper's  original  version  (see  the  footnote  on  page  i), 
only  its  input  and  output  are  summarized  now  for  simplicity.  For  the 
example: 

1.  The  272  observations  on  each  of  the  three  channel  linear  models, 

Mj  (t),  M2  (t)  and  M3  (t),  are  constructed  from  simultaneous 
observations  on  14  known  channel  component  signals  using  known 
channel  coefficients. 

2.  The  272  observations  on  each  of  the  three  independent  channel 
noises,  €j  (t),  *2  (0  and  *3  (t),  are  generated  to  be  uncorrelated 
with  mean  zero  and  known  variance  <r2  (t). 

3.  The  272  observations  on  each  of  the  three  channel  composite 
signals,  Yj  (t),  Y2  (t)  and  Y3  (t),  are  constructed  by  adding  the 
appropriate  channel  linear  model  and  noise  observations. 

4.  The  coefficients  and  data  from  all  three  channels  are  properly 
arranged,  component  signals  are  partitioned  into  seven  subsets, 
and  seven  representative  component  signals  are  selected. 

5.  The  coefficients,  Pj,  in  the  new  linear  model  are  estimated  via 
least  squares  (Pj  (0 ) ),  seven  iterations  of  IWLS  (pj  (7)),  incorpora¬ 
tion  of  a  priori  information  (p^,  and  the  optimum  weighted  least 
squares  (P0j)  for  j  =  1,  2,  .  .  .’,  7. 

6.  The  corresponding  estimators,  (t),  Y^  (t),  Y^  (t)  and 

*0k  (t).  °f  (t)  and  M^  (t )  are  computed  for  k  =  1,  2,  3. 


Graphs  of  the  three  channel  linear  models  and  composite  signals  are 
presented  in  Figures  1,  2  and  3.  Tables  1  and  2  show  the  actual  coefficients 
in  the  channel  and  new  linear  models  and  their  EIWLS  estimators,  and  a 
summary  of  information  regarding  estimators  of  coefficients  in  the  new 
linear  model,  respectively.  The  latter  indicates  that:  (1)  the  seventh  IWLS 
estimators  are,  mainly,  both  near-optimum  and  superior  to  the  least  scuares 
estimators;  (2)  the  majority  of  change  in  the  IWLS  estimators  has  occur/ed 
by  the  third  iteration;  and  (3)  the  relative  change  between  the  sixth 


*To  implement  EIWLS  for  the  example,  Mr.  P.  L.  Hsu  developed  an 
experimental  computer  program  of  exceptional  quality. 
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Table  1.  Actual  Coefficients  in  the  Channel  and  New  Linear  Models  and  Their  E1WLS  Estimators 
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Table  2.  Summary  of  Information  Regarding  Estimators  of 
Coefficients  in  the  New  Linear  Model 


1 
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Actual  Comparison  of  Estimators  of  the  Linear 
Model  for  Channel  2 


i 


I 


Figure  6.  Actual  Comnarison  of  Estimators  of  ! 

Model  for  Channel  3 
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